基于深度强化学习的空间众包任务分配策略

doi:10.16451/j.cnki.issn1003-6059.202103001

摘要
图/表
参考文献
相关文章 (6)

全文: PDF (2559 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

摘要针对动态在线任务分配策略难以有效利用历史数据进行学习、同时未考虑当前决策对未来收益的影响的问题,提出基于深度强化学习的空间众包任务分配策略.首先,以最大化长期累积收益为优化目标,基于马尔科夫决策过程从单个众包工作者的角度建模,将任务分配问题转化为对状态动作价值Q的求解及工作者与任务的一对一分配.然后采用改进的深度强化学习算法对历史任务数据进行离线学习,构建关于Q值的预测模型.最后,动态在线分配过程中实时预测Q值,作为KM(Kuhn-Munkres)算法的边权,实现全局累积收益的最优分配.在出租车真实出行数据集上的实验表明,当工作者数量在一定规模内时,文中策略可提高长期累积收益.

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	倪志伟
	刘浩
	朱旭辉
	赵杨
	冉家敏

关键词 ：空间众包, 任务分配, 多阶段序贯决策, 深度强化学习

Abstract：In the traditional dynamic online task allocation strategy, it is difficult to effectively make use of historical data for learning and the impact of current decisions on future revenue is not taken into account. Therefore, a task allocation strategy of spatial crowdsourcing based on deep reinforcement learning is proposed. Firstly, maximizing long-term cumulative income is regarded as an objective function and the task assignment problem is transformed into the solution of Q value of state action and the one-to-one distribution between workers and tasks by modeling from the perspective of a single crowdsourcing worker grounded on Markov decision process. Secondly, the improved deep reinforcement learning algorithm is applied to learn the historical task data offline to construct the prediction model with respect to Q value. Finally, Q value in real time gained by the model in the dynamic online distribution process is regarded as a side weight of KM algorithm. The optimal distribution of global cumulative returns can be achieved. The results of comparative experiment on the real taxi travel dataset show that the proposed strategy increases the long-term cumulative income while the number of workers is within a certain scale.

Key words： Spatial Crowdsourcing Task Allocation Multi-stage Sequential Decision-Making Deep Reinforcement Learning

收稿日期: 2020-06-18

ZTFLH:

TP 18

基金资助:国家自然科学基金项目(No.91546108,71901001,71521001)、安徽省科技重大专项项目(No.201903a05020020)、安徽省自然科学基金项目(No.1908085QG298)资助

通讯作者: 朱旭辉,博士,讲师,主要研究方向为智能计算、机器学习.E-mail:zhuxuhui@hfut.edu.cn.

作者简介: 倪志伟,博士,教授,主要研究方向为人工智能、机器学习、云计算.E-mail:zhwnelson@163.com.刘浩,硕士研究生,主要研究方向为空间众包、强化学习.E-mail:1448794354@qq.com.赵杨,硕士研究生,主要研究方向为空间众包、智能计算.E-mail:729805176@qq.com.冉家敏,硕士研究生,主要研究方向为差分隐私、数据挖掘.E-mail:1283125799@qq.com.

引用本文:

倪志伟, 刘浩, 朱旭辉, 赵杨, 冉家敏. 基于深度强化学习的空间众包任务分配策略[J]. 模式识别与人工智能, 2021, 34(3): 191-205. NI Zhiwei, LIU Hao, ZHU Xuhui, ZHAO Yang, RAN Jiamin. Task Allocation Strategy of Spatial Crowdsourcing Based on Deep Reinforcement Learning. , 2021, 34(3): 191-205.

链接本文:

http://manu46.magtech.com.cn/Jweb_prai/CN/10.16451/j.cnki.issn1003-6059.202103001 或 http://manu46.magtech.com.cn/Jweb_prai/CN/Y2021/V34/I3/191